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Abstract 



Operational procedures for the Graduate Record Examinations Validity Study Service 
(GREVSS) are reviewed, with the emphasis on the problem of frequent occurrence of 
negative coefficients in the fitted within-department regressions obtained by the empirical 
Bayes method of Braun and Jones (1985). Several alterations of the operational procedures 
are proposed that would reduce the frequency of negative coefficients, and, if desired, 
completely eliminate them. It is argued, however, that there are no a priori reasons for 
assuming that all the coefficients are nonnegative. Reports of the fitted within-department 
regressions should be based on a single model, that would be found by model exploration. 
The estimation procedures could be improved by employing more flexible software for 
modelling between-department variation. 
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1. Background 



The GRE Validity Study Service (GRE VSS) provides participating graduate school 
departments with an array of infomation about the association of the first-year grade 
average (FYA), a measure of academic performance, with the GRE verbal (V), quantitative 
(Q), and analytical (A) scores, and the undergraduate grade-point average (U). The report 
to a department consists of two parts: 

1. A regression formula with the estimates of the regression coefficients of FYA on V, 
Q, A, and U. 

2. Expectancy table - estimated distributions of FYA based on the predicted F\'A 
(pFYA). 

The departments can use the estimated regression formula and the expectancy table 
to assess the relative importance of various admission measures and to predict the success 
of the applicants. Some departments may use these formulas to adjust their admission rules. 

Under normal circumstances it is expected that the regression formula would have 
all four coefficients nonnegative, and that the distribution of the outcomes for each feasible 
value of pFYA would be unimodal. Examples to the contrary are: 

Regression formula:' 

pFYA = 2.8 + .09V + .14Q - .02A + .04U 
Expectancy table for pFYA = 3.0 (one row of the Table): 



'in order to simplify the presentation, the scores V, Q, and A are defined on the scale 
1 - 4, obtained by dividing the original GRE scores by 200. 
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pFYA ! < 2.5 2.5 - 3.0 3.0 - 3.2 3.2 - 3.4 3.4 - 3.6 3.6 - 3.8 > 3.8 ! 

i I 

1 , 

3.0 ! 29 21 21 9 10 6 41 • 

I I 

' I 

In this example the negative coefficient on A has no straightforward interpretation 
and can be explained only by a reference to the complex processes of selection and self- 
selection of students into departments and the idiosyncratic influences of the 
department/school environment on the students. The example of the expectancy table, 
containing a percentage breakdown of students with a specific value of pFYA into bands of 
FYA scores, appears to suggest that a student with predicted FYA of 3.0 is slightly more 
likely to have an eventual FYA in the range 3.4 - 3.6 than in the range 3.2 - 3.4, even though 
the latter range is closer to the actual prediction. This is clearly a contradictory outcome, 
indicating problems at some stages of the statistical analysis. 

2. Purpose of the Study 

The main purpose of the study reported here is to explore the sources/reasons for 
these aberrant features of the statistical analyses on which the GRE VSS reports are 
ba.sed and to devise alterations to the currently used procedures that would integrally 
produce nonnegative coefficients for all the departments. 

The currently used procedures are based on a hierarchy of empirical Bayes 
models. For each data set, 16 empirical Bayes models are fitted. The software for 
model fitting employs the method of Braun and Jones (1985). In the fitted models each 
of the four regression coefficients is either constrained to zero for every department or 
department-specific coefficients are estimated. The estimates of the sets of five 
coefficients (the four variables and the intercept) may vary across the departments. The 
distribution of a coefficient across the departments is characterized by its mean and 
variance. Estimating department-specific regression formulas is the underpinning of the 
Validity Study Service, since its purpose is to describe department-specific characteristics. 
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For each department, estimated regression coefficients are reported from one of 
the 16 models, in which each regression coefficient is nonnegative. Such a procedure 
may involve bias due to selection of the reported formula. The size and importance of 
this model selection bias depend on the relative sizes of the estimated means and 
variances of the regression slopes across the departments. The analyses reported in 
Section 6 imply that in general the means of the slopes are small relative to their 
standard deviations, and so the associated bias is not ignorable. However, this problem 
is C(>nfounded with that of multicollinearity; the means of the slopes are relatively small 
or the standard deviations of the slopes relatively large, partly because of 
multicollinearity among the estimated parameters. The issue of multicollinearity is 
discussed in detail in Section 6. We propose a method that would rely on a single model 
fit for all the departments, and we describe simple procedures for selection of this 
model. 

Sections 3 and 4 provide a summary of the empirical Bayes methods and of their 
relevance to the GRE VSS. In Section 5 an "extended" shrinkage method is described, 
and its application for the GRE VSS is discussed. The extended shrinkage can remove 
most of the negative coefficients, and, in judiciously selected models, all the negative 
coefficients. The empirical Bayes models have certain optimality properties, and so the 
coefficients estimated by the extended shrinkage are likely to have poorer statistical 
properties than those obtained by the original empirical Bayes method. Therefore, it is, 
important that extended shrinkage be used sparingly. In Section 6 multicollinearity is 
identified as one reason for frequent occurrence of negative estimated coefficients. We 
propose two approaches to combatting this problem: use of simpler models and 
enhancement of software to extend the model choice. The scope of possible 
improvements in statistical modelling of the GRE VSS data is summarized in Section 7. 
In Section 8 a simple method for calculation of the expectancy tables is described. It is 
almost identical to the currently used procedure, but it avoids repeated (numerical) 
multidimensional integration. Analogues of the ordinary regression R“ "proportion of the 
variation explained" are given in Section 9. The report concludes with a discussion of 
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admissibility of negative coefficients (Section 10) and a list of recommended changes 
(Section 11). 



3. Empirical Bayes Models 

The first prerequisite for an empirical Bayes analysis is the clustering of the 
observations. In the case of the GRE VSS we have students clustered within graduate 
school departments. Modelling of further clustering of the departments within schools, or 
department years within departments, has not been considered in the operation of the GRE 
VSS because the clusters at the higher levels contain very few units; for example, no 
department has provided data for more than three years, and no school has contributed to 
the data with more than six departments. 

The largest available dataset contains records of the students who communicate best 
in English (dataset CE). It consists of 9,200 records of students from 606 departments, 
collected over the eight cycles of the study. The most recent cycle^ cycle 18, has 2,230 
students, and the previous cycles, 11-17, contain 6,970 students. The data from these cycles 
are pooled in order to make full use of between-department information. Records from the 
same dej artment at different cycles are regarded as separate units, and in this report we 
refer to them as different departments. The departments have provided data for between 
5 and 106 students. Throughout the report vve refer to this dataset for illustration. 

Most graduate departments have a very small number of students in any particular 
year, or even over several years, and so estimates of the regression coefficients based solely 
on the data from a department would have very large standard errors. For example, in the 
CE dataset there are two departments with more than 100 students: Department No. 1 has 
102 records, and department No. 229 has 106 records in the dataset. The within-department 
ordinary regressions for these departments are given in Table 1. We see that only the 
variable U is significantly different from zero (at the 5% level), and that modelling of 
nonlinear regression is impnictical becau.se the standard errors become vastly inflated. Even 





’At the time of writing of the report, September 1989. 
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for linear regression models any comparison of the regression coefficients across the two 
departments is meaningless because only unrealistically large differences would be 
statistically significant. 

This provides a rationale for application of the empirical Bayes (EB) regression. The 
EB estimates of the department-specific regression coefficients are formed as a mixture of 
two estimates: (a) the estimates of the regression coefficients based solely on the data from 
the department and (b) the coefficients from the pooled regression using data from all the 
departments. 

The within-department regression a, is unbiased but statistically inefficient in that the 
collateral information, contained in the data from the other departments, is not used. The 
pooled regression (b), is biased but has certain consistency properties. The optimal mixing 
weights are established by the EB procedure. A detailed exposition of the empirical Bayes 
models is given in Braun and Jones (1985) and Braun (1989), and here we provide only a 
minimal summary. Readers interested in further background are referred to the review 
paper by Morris (1983) and the references therein. 

We assume the linear model'^ 



FYA„ = c, + v; V„ + q; 0,j + a,' A,, + u,‘ U;, + e,,, (1) 

where the lowercase letters Cj, Vj, pj, aj and Uj denote the coefficients for the department j 
(j = 1, 2, ..., J), and the uppercase letters denote the scores for the student i (i = 1, 2, ..., 
nj) on the relevant variables. The random terms Cjj represent the composite of the 
measurement error for FYA and model inadequacy (lack of fit). The linear model (1) 



typographical note: Throughout the report the following notation is used; statistical 
parameters are denoted by lowercase characters, vectors of parameters by bold lowercase 
characters, and matrices of parameters by bold uppercase characters. Students’ scores are 
denoted by capitals with double subscript ij denoting the student i in department] (e.g. Vjj). 
For department mean scores the "dot" notation is used (e.g., V.^). Estimates of parameters 
and of conditional expectations are denoted by the ^ (e.g., b denotes an estimate for b). 
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relates the individual scores to department-level coefficients, which are then further related 
to a set of population parameters; 
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where V.j, Q.j, A.^, U.j are the department means for the explanatory variables V, Q, A, and 
U, respectively, g are the population parameters, and are (department-level) residual 
terms. The model (2) refers to a specific choice of departmental covariates; in principle any 
variable defined for the departments can be used as a covariate. It is advantageous to 
introduce more compact notation for (1) and (2), such as 



FYA,. = Xi-bj + 

bj = X.jg + 



( 3 ) 



where Xjj = (1, Vjj, Q^j, A,j, Ujj), X.j = (1, Cj, Vj, qj, aj, Uj)‘, X.j = (1, V.j, Q.j, A.j, Q.j) and 8- 
= (5£.j, 5yj, 5qj, 5^j, 5uj)^ Braun and Jones (1985) use a more compact notation; 



FYAj = Xjbj + Cj 

b = X.g + 5 (4) 



where FYAj, Cj and Xj are the department vectors and matrix, respectively, corresponding 
to the outcome, the random terms, and the explanatory variables, and b, 8, and X are the 
respective vectors and matrix of the department coefficients, the department-level random 
terms, and the within-department means. In principle, the department means in these 
models can be replaced, or augmented, by other covariates defined for the departments. 
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although in the GRE VSS only the department means are used. We assume that the vectors 
of random terms e and d are mutually independent and that 

- N5(0, X) (i. i. d), (5) 

i.e. {5j}j = i 2 j form a random sample from a five-variate normal distribution with zero 

means and a (nonnegative definite) variance matrix X. The student-level random terms e- 
are assumed to be themselves mutually independent and distributed according to N(0, ap. 
Alternatively, the model (1) - (2) can be described as a random coefficients model, in which 
the within-department regressions form a set of independent normally distributed random 
variables with a common structure for the mean, and common variance, 

b| - N,(X.|g, X). (6) 

Flexibility of model choice is achieved by deleting department- and/or student-level 
variables from the model (1) - (2). For example, for the CE dataset the following submodel 
of (1) - (2) is considered: 



FYAjj - Cj -I- Vj Vij -I- c]j -I- Uj Ajj -i- Uj Uy -i- Cjj, 

Cj = goc + gcv V.j 5, j 

Vj = gvc + gw V.j 

Mj = gqc + gqv v.j 5^, j (7) 

~ gac gav V.j j 

l-lj = guc ^ guv V.j -I- 5^, j 

Exclusion of a student-level variable - say A|j - from the student-level model (1) 
corresponds to setting aj = 0 or, equivalently, g^,, = g^^ = 0 and S^ j = 0. In the operation 
of the GRE VSS each of the coefficients for the four variables U, V, Q, and A, is either 
constrained to zero or is estimated. This gives rise to the 16 models that are routinely fitted 
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for each dataset in the GRE VSS. Note that exclusion of a variable in (1) implies not only 
deletion of the associated row in (7), but also deletion of the corresponding row and column 
of S (or setting each element of this row and column to 0). 

A computational algorithm for fitting the EB model is described in the technical 
appendix of Braun and Jones (1985). 

4. Validity of the EB Model 

The empirical Bayes model (l)-(2) provides an idealized description for the available 
data. Firstly, the assumption of normality of the random terms ejj is grossly violated because 
of the "lumpiness" of the data: The outcome score FYA is the average of a small number 
of (integer) grades, which are themselves highly correlated. As a result, more students tend 
to have FYA scores near the values 3, 3.5, and, 4 and in several departments only a very 
limited number of possible scores can be achieved. The scale of the FYA is too coarse for 
any assumptions of normality to be satisfied. Also, the scoring of FYA may reflect different 
standards of the institutions, or even of the departments. 

Similarly, the predictor score U is not objectively scaled, and students in a graduate 
department usually come from a variety of undergraduate colleges. For the observed 
predictor scores V, Q, A, and U we have to consider the underlying latent traits as the 
appropriate explanatory variables, and in this perspective the observed scores represent the 
latent traits subject to measurement error. In the EB analysis this component of 
contamination is ignored; no practical methods for its incorporation are available. 

The department-level variables are included in the model to represent the "context" 
of the department. In this respect the wiihin-department means represent proxies for some 
department-level traits. The reliability of such proxies cannot be assessed since we do not 
have a definition of the underlying traits. 

Search for additional predictors for the EB model is likely to be futile unless it is 
based on information about the underlying educational processes. Operationalizing these 
additiontil predictors would lead to a number of difficulties, including developing a rigorous 
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definition of the predictor, devising reliable means of eliciting additional information from 
the departments without loss of cooperation, and so on. 

A large proportion of the students have achieved the perfect score on the outcome 
FYA and/or on the predictor U; about 10% have achieved the perfect FYA score, 4, and 
5% have a IJ score of at least 3.9. This may significantly diminish the validity of the 
underlying scales and is an additional threat to the assumptions of normality. 

'Fhe main advantage of the EB models for the GRE VSS is in their compromise 
between parsimony and adequacy. We prefer to use models with as few parameters as 
possible, while insisting on having all the salient features of the data (and of the processes 
that generate them) explicitly represented in the models. In the absence of complete 
information about these processes the analyst would be inclined to represent in the model 
as much collateral information, in the form of explanatory variables, as possible. This 
improves the chances of generating an adequate model, at the cost of possible redundancy 
and loss of efficiency. 

In standard statistical models several tools for arbitrating between model adequacy 
and statistical efficiency are available. In ordinary regression (ordinary least squares) the 
well-known t- and F-tests are often employed to find variables that make unimportant 
contributions toward description of variation of the outcomes. In the implementation of 
Braun and Jones (1985) the analogues of the t- and F-tests cannot be performed because 
standard errors for the estimated parameters are not available. The likelihood ratio test 
could be u.scd for comparing the quality of fit for two models, one of which is a special case 
of the other. 

The extreme case of model redundancy is multicollinearity, or linear dependence, of 
the predictors. For example, if the scores V, Q, and A were linearly dependent, one of 
these three variables could be excluded from the models without any loss of adequacy. 
Standard statistictil packages implement various measures for collinearity such as the 
"Measure for Collinearity" in F4STAT, the square of the partial correlation of the variable 
with the outcome, given all the other predictor variables. Another simple indicator of 
collinearity is the condition number (see Section 6). In practical situations, the more 
explantitory vtiriables are used, the greater the threat of collinearity. 'Fhis is certainly the 
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case in many educational research applications where many predictors are highly correlated 
with the general ability g. 

Consider an alternative representation of the EB model (1) - (2) 



FYA;^ = 


8cc 


+ 


gc-vV.^ 


+ 


gcqQ-j 
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gcaA.j 
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ga 






Svc^jj 
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gwV.jV;. 
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gvqO.jVjj 
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gva^'j^ij 
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gv. 






SqcQij 
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gqvV.jQij 
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gqqQ.jOij 


-h 


§qaA*jO,j 


+ 


gq, 






gacAij 
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gavV.jA,j 


+ 


gac,0.jAij 


+ 


gaa^j^ij 


+ 


gai 






guc^ij 


+ 


guvV.jU;j 


+ 


guqO.jUij 


+ 


gua^*j^ij 


+ 


guuU.,U;^ 



+ Yij, (8) 

where Vij = + SgiA- + + Cjj accumulates all the random terms. 

The threat of the collinearity in (8) is obvious. The regressor variables A^, V.jAjj, Q.jAjj, 
A.jAjj and U.jAjj [a row of (8)] are very closely related. The range of values of the 
department-means of the four scores V, Q, A, and U is very narrow, and many departments 
have students with very similar scores, thus strengthening the redundancy in the EB models 
that use several department-level means. The analysis reported in Section 6 indicates acute 
collinearity not only among the 25 predictors, but also within the set of 10 predictors that 
would be considered in the present operation of the GRE VSS as relatively simple models. 
It turns out that the cross-level interactions V.jV-, V.jQjj, V.jAjj and V.jUjj, included to take 
account of the context of the department, are the main causes of collinearity. The model 
(7) contains five parameters for the students’ scores (g,; j.) and five parameters for the context 
(gx,y). Description of dependence of the outcome on the student background could be 
supplemented by quadratic terms if more adequacy was required, but the description for the 
context contains a lot of redundancy. 

The variance matrix X that describes the department-level variation contains 15 
parameters - 5 variances and 10 covariances. I'he paramctrization for X may also contain 
some redundancy, but with the software used in the operation of the GRE VSS no 
constraints on X can be imposed. Of particular importance would be setting the variances 
to zero (common regression slope for all the departments) and certain covariances to zero. 
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so as to enhance statistical efficiency. We note that even in data with a large number of 
clusters (departments) the data may not contain plentiful information about the multivariate 






singular, and would probably be singular it pertect convergence were acnieveia i^in oruer lo 



Another element of model redundancy is in allowing separate student-level variances 
crt for each department. As an alternative, a common variance <r should be considered. 
The need for differing student-level variances could be established only if the data contained 



From the formulation (7) we can see the EB model is a special case of the random 
regression model in which each regression coefficient is either constant across all the 
clusters or varies from cluster to cluster according to the normal law. In (7) the regression 
coefficients in the first line are declared as varying from department to department. 
Software for these general models is available, using the EM algorithm"* (Raudenbush and 
Bryk, 1986), the Fisher-scoring algorithm (Longford, 1987), or the iteratively reweighted 
least squares (Goldstein, 1986). The EM algorithm is generally very slow, especially with 
complex models (as many as 500 iterations may be required), while the other two algorithms 
require usually fewer than 15 iterations, and provide standard errors for all the estimated 
parameters. Convergence of the EM can be substantially speeded up by simple acceleration 
routines (such as reported in Lindstrom and Bates, 1988). 



■’EM stands for Expectation - Mtiximization; see Dempster, Laird, and Rubin (1977) for 
details. 




control costs in the operation, the EB procedure is stopped after a presetVAimber of 
iterations). 



a lot of large departments. At present, the parsimonious model with common or s 
preferable. 
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5. Negative Coefficients and Extended Shrinkage 
In the model formula (7), rewritten in the form 

= gcc + gcvV.j + 

+ (gvc + gwV.j + 5^j)Vjj 
+ (gqc + gqvV.j + 5qj)Qij 
+ (gac + gavV.j + 5aj)Ajj 

+ (guc + guvV.j + + Eij, (9) 

the parentheses in lines 2-5 contain the respective regression coefficients on the scores V, 
Q, A, and U. We can see from (9) how the regression coefficient, say, 

gvc gw^*j ^vj 

for V, depends on the department means V.j. A condition for these "true" coefficients to 
be nonnegative for all the departments is that their expectations g^^ + gw^*j t>e nonnegative 
for all values of V.j that occur in the data, and substantially larger than the standard 
deviation (the square root of the variance) of 

gvc + gwV.j > 

and analogously for the other variables, 

gqc + gqvV.j > 

gac + gavV.j > (^ 33 )^/^ ( 10 ) 

guc + guvV.j > 

In practice we consider the estimates of the parameters g,^, and of the variance matrbc X in 
place of the parameters in (10), and the posterior expectations of the random terms 8^y in 
place of the random terms in (9), see (1 1). The condition (10) is less likely to be satisfied 
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the more overparametrization there is for the variance matrix S. In particular, when the 
estimate of S is singular, its diagonal elements (the estimated variances) tend to be inflated. 

The conditional expectations for the within-department regression coefficients 

bj = (Cj, Vj, qj, aj, Uj) 

are obtained by the formula 

bj = E(bj I FYA, X; {of}-, G, S) 

= (P' + Pj)-'(P‘G^X.j + PjBj), (11) 

where P* = Pj = XjXj‘/o7- Bj f (XjXj^)’'Xjyj is the within-department ordinary least 
squares solution, X.j the vector of within-department means of the scores on X, and G is the 
matrix of the parameters g^y. The unknown parameters in (11) are replaced by their 
maximum likelihood estimates. The vector of coefficients (11) is a mbcture of two estimates, 
the pooled regression estimate G^X.j and the within-department regression Bj. An 
alternative interpretation is that the within-department regression estimates are being shrunk 
toward the overall regression, and the amount of shrinkage is determined so as to optimally 
combine the within-department and the pooled-data information. The mixing weight for the 
latter is given by the (estimated) within-department information, Pj. Since the 
within-department regressions have a lot of sampling variability, a necessary condition for 
obtaining nonnegative coefficients for all the departments is that the "stable" component, the 
pooled regression estimate G*X.j, be positive for all the departments. Even if this condition 
is satisfied, a negative coefficient for a department can arise when the within-department 
regression has a negative coefficient so large in absolute value that it remains negative even 
after the shrinkage to the pooled regression. 

The straightforward solution to this aberration is to extend the shrinkage until the 
coefficient is shrunk to zero. Such an extended shrinkage is guaranteed by the positive 
regression coefficients of the pooled-data regression. Of course, no theoretical justification 
for this procedure can be given, other than prior information that all the departments have 





14 



nonnegative regression coefficients. The procedure can be partly justified on the grounds 
of poor resampling properties of the estimators of the variance matrbc X (because of 
over-parametrization) and of the variances cr- (based on too few data-points). 

The consequences of this extended shrinkage have to be carefully weighed. First, if 
sampling variation of the estimators for X and crj is ignored, the extended shrinkage is less 
optimal, in terms of statistical efficiency, than the original shrinkage determined by the EB 
procedure. Therefore, this adjustment method should be used sparingly. Second, if we 
insist on nonnegative estimated coefficients, then for successful application of the extended 
shrinkage we require an EB solution for which G^X.j has nonnegative components for all 
values of the departmental covariate(s) X j. The matrix of parameters G can be consistently 
estimated by ordinary least squares, that is, by analyzing the entire dataset without 
department identification (treating all students as a single department). Thus, the burden 
of model selection is shifted to the pooled-data regression. 

Model selection based on the pooled-data regression may turn out to be 
advantageous for the GRE VSS. If model selection is based on ordinary pooled-data 
regression, the lise of the computationally intensive EB procedures can be postponed to the 
fitting of only a very small number of models. At the first stage, ordinary regression models 
would be fitted for the pooled data using the student-level scores, their department-level 
means, and the cross-level interactions. One or a small number of parsimonious models 
would be adopted for which all the departments have nonnegative pooled regression 
coefficients. The corresponding EB models would be fitted, with the addition of the 
extended shrinkage. The amount of extended shrinkage would be monitored to provide an 
additional criterion for selection among the EB model fits. Various diagnostic procedures 
for multicollinearity in ordinary regression can be directly applied, as discussed in Section 
6 . 

6. Multicollinearity in EB Regression 

For the analysis of the CE dataset in the operation of the GRE VSS the EB model 
(7) with the covariate V.j is used. The expectation of an outcome is 
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£(FYAij) = (1 V,j Q,j Aij Ui|)G(l V,,f. 



or in matrix notation 



£(FYA,j) = XP, 



( 12 ) 



where X is a matrix with N = 9,200 rows (students) and 10 columns representing the 



parameter vector p is uniquely defined only if X is of full rank, r(X) = 10. Otherwise, if X 
is singular, a column of the matrix X could be reconstructed as a linear combination of the 
other columns, say. 



where X.jq is the Nx9 matrix formed from X by deleting the 10th column, and B is a 9x9 
matrix of full rank. Then the regression formula becomes 



with /3* = B^. Therefore, the 10th variable could be deleted from the model and the data 



corresponding elements of p, thus making these parameters difficult to interpret. 

Often in regression problems the matrbc X is of full rank, but is almost singular. 
Proximity of the matrix X to singularity is referred to as multicollinearity, or ill-conditioning. 

The extent of multicollinearity of the design matrbc X can be established by an 
eigenvalue analysis of the corresponding matrix of crossproducts, X^X. Let 





(13) 



£(FYA,j) = X 10^-, 



(14) 



description simplifie ’. Some elements of p' may be substantially different from the 
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be the eigenvalue decomposition for X^X, with the (positive) eigenvalues Xj > Xj ^ ... s 
X,o in descending order. The ratio of the largest and smallest eigenvalues is referred to as 
the condition number, and it is a useful indicator of multicollinearity in ordinary regression. 
A large condition number implies almost linear dependence of a column of the design 
matrix X, such as (13). Let p be the estimate of p obtained by EB analysis. Then the 
vector of fitted expected values, Xp, is very close to the vector of fitted expected values from 
the model with the abbreviated list of regressors (14), X.^Qp*. Thus, we have two vectors of 
estimates for p, p and (^’,0), which lead to very similar fitted expected values, but the 
corresponding elements of these vectors may be substantially different, and may even have 
different signs. Multicollinearity in ordinary regression is associated with highly inflated 
standard errors for some of the estimates. Also, the estimates tend to be very unstable; a 
small change in the data, or in the model specification, may cause a profound change in the 
estimates. The EB estimates share these undesirable properties; the inflation of the 
standard errors could be demonstrated if the standard errors of the estimates were available, 
and instability can be observed by the substantial changes in the estimates that occur after 
even a modest change of the model specification. 

When the estimated regression parameters are unstable, they are less likely to satisfy 
certain inequalities believed to hold for the parameters, such as 

L + Ly-i > 0 

+ i,vV.j > 0 

L + ivV.j > 0 

Lc + guvV.j >0, (15) 

[compare with (10)] for the values of V.j that occur in the dataset, or in the underlying 
population of departments. The left-hand sides of the inequalities in (15) are the estimated 
average slopes of the respective scores V, Q, A and U in a department with the mean V 
score V,j. 

Suppose (f~) + g(|vV* < 0 for a specified value of V'. The fitted 

department-regression coefficient on Q for a department with V.j = V’ (denoted by q^ 
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differs from the mean regression coefficient f by the posterior mean of the deviation 
The mean of the posterior deviations = qj - f for the departmt. iits with the 
department-mean V score V* is very dose to zero (unless the model is seriously 
misspecified), and therefore at least half the departments with department-mean V scores 
in the vicinity of V’ have negative fitted coefficients on Q. Nevertheless, owing to the 
multicollinearity of X, it is feasible that the fit for the data could be almost exactly 
reproduced by a completely different set of regression parameters Although this would 
imply substantially different department-regression coefficients bj, there might be only 
insubstantial changes in the fitted values for the students. In other words, the currently 
applied EB model may have very good crossvalidation properties for prediction of the 
outcome scores {yjj} (see Braun and Jones, 1985, Section 3.6), but not so as good properties 
for prediction of the regression coefficients. 

The eigenvalues of the matrix of crossproducts for (1 V|j Qjj Ajj U|j V.j) VjjV.j 
QijV.j AjjV.j UjjV.j, the regressors implied by the model (7), are 

2.71x10"', 2.02xl0^ 1.28xl0\ 7.58xl0\ 2.93xl0^ 

679, 26.5, 14.8, 9.25, 1.34, 

and so the condition number is about 2xl0"’. If we simplify the structure of the model 
(12) by excluding the regressors V.j, VjjV.j, QjjV.j, AjjV.j and UjjV.j, that is, by deleting the 
covariate V.^, the corresponding matrix of crossproducts has the eigenvalues 

3.23x10^, 2.50xl0\ 1.49xl0-\ 921, 118, 

with the condition number of about 2,750. Deletion of these five regressors, 
corresponding to deletion of the covariate V.j from (7), may lead to a substantially 
poorer fit to the data. It would be more appropriate to delete one regressor at a time 
and assess the loss of adequacy of the resulting regression fit. Deletion of a variable - 
say A - corresponds to deletion of the regressors Ajj and AjjV.j (a row of the matrix G), 
and the eigenvalues of the corresponding matrix of crossproducts are 

or 

O k) 
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2.09xl0^ 1.78xl0\ 1.37xl0^ 2.38xl0^ 658, 23.6, 14.6, 1.36, 

(condition number 1.55x10^), and so multicollinearity remains present. Deletion of a 
variable (of a GRE test score, since U is the predictor with the largest estimated slope and 
smallest standard error) would also be undesirable on theoretical grounds; it is believed that 
the GRE analytical score A makes a contribution toward description of the performance in 
graduate school above and beyond the verbal and quantitative scoies. Sources of 
multicollinearity can be explored in more detail by considering submatrices of the 
functionally related regressors. For example, the eigenvalues corresponding to the 
regressors 1, Vjj, V.j, VjjV.j are 

6.51xl()5, 2.71 x1()3, 497, 3.91 

(condition number 1.66x10^). This indicates that the regressor VjjV.j could be deleted with 
minimal loss to the quality of the fit. A similar pattern of the eigenvalues is observed for 
the other sets of regressors 1, Zjj, V.j, where is either U, Q, or A (the 

corresponding condition numbers are 6.75x10^, 1.65x10^ and 1.88x10^). 

We consider two methods for combatting multicollinearity. The first method involves 
model simplification and a description of the causes of multicollinearity. The second 
method, ridge regression (Hoerl and Kennard, 1970), is a general principle based on 
adjustment of the matrix of crossproducts X^X. 

Model Simplification 

Within the framework of the algorithm of Braun and Jones (1985) and the 
operational software based on it, two kinds of model simplification are possible: (1) 

deletion of a covariate (a column of G), and (2) deletion of a variable (a row of G). For 
the CE dataset the former leads to deletion of five regressors and possibly a substantially 
poorer fit for the data. By deleting a variable two regressors are removed, but, as the 
eigenvalue analysis indicates, this would not reduce the acute multicollinearity among the 
regressors. 
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Clearly, multicollinearity is caused by the cross-level interactions VjjV.j, QijV.j, A^jV.j 
UjjV.j some, or all of which should be deleted from the model, while retaining the 
department mean V.j, or if necessary even adding another mean (e.g., U.j) to the list of 
regressors. These models cannot be fitted by the operational software based on Braun and 
Jones (1985), although in other implementations of the EB models, such as Bryk, 
Raudenbush, Seltzer, and Congdon (1988), Rasbash, Prosser, and Goldstein (1988) and 
Ix)ngford (1988) they can be fitted routinely. 

Ridge Regression 

Ridge regre.ssion is a standard method for combatting multicollinearity in ordinary 
regression. If the matrix of crossproducts X^X has a small eigenvalue, then its inverse 
(X^X)’' and the ordinary least squares solution (X^X)‘^X^y are unstable. Stability of the 
solution can be enhanced by replacing the matrix of crossproducts by X“^X + hi, where I is 
the unit matrix and h>0 a tuning constant. The choice for h should be such as to induce 
little bias (the smaller h the lesser the bias) and to promote stability (the higher h the higher 
the eigenvalues of X^X + hi, and the more stable the ridge regression solution 
(X'^X + hI)‘'X'^X). In the EB approach we may consider applying ridge regression for the 
within-department regressions as well as for estimation of the matrbc G. The 
within-department regressions involve ordinary least squares, and so the application of the 
ridge regression is straightforward as long as we have an intelligent method of choosing the 
constant hj; each department may have a different ridge constant. The matrbc G is 
estimated by the multivariate regression 

G - (z^zy^z^'R, 



where Z is the matrix of covariates, with rows (1 V.j), and R is the matrix of the estimated 
department-regression coefficients, consisting of rows bj. An opportunistic choice for the 
ridge constant would be the smallest value of h for which the components of 
(Z^ Z + hI)’^Z‘R satisfy the inequalities (15). The choice of the ridge constant h affects the 
estimates of the regression parameters G, which in turn influence the estimate of the matrix 
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of department-regression coefficients R. Thus, another layer of iterations of the EB 
procedure would have to be implemented, which would iteratively calculate the matrices G 
and R and update all the variance and covariance parameters in the process. A "short-cut" 
solution would involve finding a suitable value of h for the fixed set of within-department 
regression coefficients R obtained after convergence of the operational EB algorithm. 

7. Scope for Improvement of the Regression Model 

The importance of the covariate V.j can be explored by fitting the EB model in two 
.stages. First we fit the EB model (1) with no covariates ("shrinking to a point" in the 
terminology of Braun and Jone.s, 1985) and obtain the within-department regression 
coefficients. In the second stage, these within-department regression coefficients are 
regressed on the covariate V.j. The results of this stage are displayed in the "Submodel" 
column of Table 2. For comparison, we fit the operational model, (7) (using V.j as a 
covariate), and regress the resulting department-regression coefficients on V.j. Having 
accounted for the covariate V.j in the model (7), the estimated slopes on V.j should be equal 
to zero. But the estimated slopes, displayed under the column heading "Operational model" 
in Table 2, are of comparable size with the corresponding estimated slopes for the 
submodel. Some of the simple regressions on V.j are even significant, using the traditional 
t-ratio test, say, at the 5% level of significance. We see that the intended role of the 
covariate in the operational model, to account for systematic variation due to V.j, has not 
been fulfilled. This is most likely due to the combination of acute multicollinearity and 
imperfect convergence of the employed algorithm. 

Another area of possible model improvement is in nonlinear regression. In general 
we can consider a polynomial regression in the variables V, O, A, and U. It turns out that 
a small number of quadratic terms significantly improve the fit of the model, and these 
additional variables contribute only marginally to multicollinearity. In the operational 
software these variables would have to be associated with between-department variation, but 
in other software the associated variances could be constrained to zero. We note that 
nonlinear regression would substantially complicate the discussion of negative coefficients. 
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and would involve substantial changes in the presentation of the regression formulas in the 
GRE VSS reports. 

Variances and Covariances in X 

The variances in the matrix S can be interpreted as a measure of variation of the 
within-department regressions. A randomly selected department with the verbal score mean 
V. has the slope on Q equal to q = g^^ + g^^V. + 8^ where 8^ ~ N{0, or equivalently, 
q ~ '’'^(gqc + gqv^*> ^w)- Therefore, the probability that q is positive is equal to 
^{(gqc + gqv^*)/(Sw)^^“}> where O is the distribution function for the standard normal 
distribution A(0,1). The estimates of the variances in S, in conjunction with the regression 
parameter estimates, indicate the frequency of negative department-regression coefficients. 

Multicollinearity can also arise among the estimated variances and covariances. The 
procedures for detecting multicollinearity can be based on the estimated information matrix 
(and its eigenvalue decomposition) for the variances and covariances or the standard errors 
associated with these parameters. These are not available in the operational software, but 
are readily available in the software based on the iteratively reweighted least squares 
procedure (Goldstein, 1986) and the Fisher-scoring algorithm (Longford, 1987). 
Multicollinearity is present to some extent among the variance and covariance parameters, 
because information about the within-department slopes is very scarce, but the problem is 
not as acute as for the regression parameters. To alleviate multicollinearity, several 
covariances could be constrained to 0, and so the number of (co-)variance parameters in S 
would be reduced from 15 to 12 or even lower. Moreover, constraining the coefficients on 
A to a constant, and those of V to a different constant - which implies constraints on two 
more variance and seven more covariance parameters - would, in the CE dataset, be 
justified. For the CE dataset the corresponding likelihood ratio statistic is equal to 10,1 (x‘ 
null-distribution with 9 degrees of freedom), indicating insignificant loss of model adequacy. 
Such model simplification would contribute toward reduction of the number of departments 
with negative department-regression coefficients since less multicollinearity in S would lead 
to smaller estimates of the variances and more pronounced shrinkage. 
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Modelling Within-department Variation 

In Braun and Jones (1985) different within-department variances crj are fitted. Their 
estimates are based on the (iteratively updated) within-department sums of squares of 
residuals, and so for small departments they have very poor resampling properties. Many 
departments with small numbers of students with a wide range of backgrounds or a small 
range of outcomes have very small estimated variances aj. In the formula for the 
within-department regression coefficients bj, (11), with the parameters replaced by their 
estimates, the between-department and within-department regressions are weighted in the 
proportions of their estimated precisions. A small value of the estimate then causes the 
within-department regression to be inaccurately regarded as very well determined (Xj*’Xj/c 7 j 
is very large relative to S), and therefore minimal shrinkage takes place. The plots in 
Figure 1 demonstrate the association of the estimated regression coefficients with the 
estimated within-department variance. In these plots only the departments with fitted 
variance cr^ < A are represented. Among the departments with larger fitted variance there 
are only three instances of negative fitted coefficients (each with respect to the score A). 
The EB algorithm could be adapted to estimate a common within-department variance 
to hedge against this phenomenon, as well as to promote model parsimony. Technical 
details are given in Appendix B. 

Thus, common variance ensures more equitable shrinkage, but we note that 
overparametrized regression part of the model may cause some of the coefficients for some 
departments to shrink toward negative values. Therefore, application of extended shrinkage 
is suitable only in conjunction with careful choice of the EB model. 

8. Expectancy Tables 

In the current procedures, computation of the expectancy tables involves numerical 
integration with respect to a five-variate normal density. The number of random draws from 
the integrating distribution, set at 100, is most likely insufficient, and that causes aberrant 
features in the simulated expectancy tables. We propose a method that involves no 




30 



23 




numerical integration and guarantees unimodality of the row- and column-distributions in 
the expectancy tables. 

The posterior distribution of the department-regression coefficients is 

(bj I G, X, <Tj) - Wlr,, (p- + P,)-'], 

where fj is the vector of the posterior means for department j, P* = and S is the 
unconditional variance of {bj}, and Pj = Xj^Xj/oj is the within-department information 
matrbc. 

The fitted regression formula for department j is 

yij = Xijbj + £ij, 

so that the posterior distribution of the outcome y^ given the vector scores Xjj is 

yij ~ ^(aij, dij), (16) 

where a^ = and djj = (t| + Xjj(P* + Pj)'^xT. Note the different vectors of background 
scores x^ may yield the same mean ajj but different variances djj. 

Calculation of the Expectancy Tables . 

The expectanc)' tables contain the estimated conditional probabilities of the outcome 
score yjj in a given department, given that predicted score ajj is in a specified range. If we 
conditioned on the scores Xjj, a standard confidence interval could be derived from (16) by 
ignoring the sampling variability due to estimation of ajj and djj. If for a given posterior 
mean the variance djj as a function of the predictor vector Xjj has a wide range of values, 
estimation of the probabilities in the expectancy tables could be substantially improved by 
conditioning on the future scores Xjj. For departments with small numbers of students, it 
would be meaningful to consider the average of the fitted posterior variances dj = Ej djj/nj. 
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and, therefore, for the prediction in the expectancy tables we could use the approximation 
to (16), 



This implies equal variances corresponding to each row of the expectancy table. 

For the departments with the largest representation in the GRE VSS data (say, 40 
or more students) it may be advantageous to consider separate averages of the posterior 
variances for the students with fitted scores in each of the specified ranges. 

The conditional probabilities are approximated by the formula 



where a is the predicted outcome score (column of the expectancy table), (Cj, C 2 ) the range 
of scores, and dj j and dj 2 the corre.sponding averages of the posterior variances (equal to 
a common value dj for small departments). Note that in Section 7 we recommend that a 
common within-department variance be estimated (cj = for all j). 



In this section we propose an R^-type coefficient, specific to a department, that would 
reflect the quality of the model fit for the data from each department. In the regression 
analysis of independent observations (e.g., when there are data from one department only) 
we use the familiar R^ defined as 



where cr^ is the residual variance in the assumed model (i.e., with regressors 1, V, Q, A, and 
U), and is the raw variance of the outcomes FYA or, equivalently, the residual variance 



Yij ~ A(aij, dj). 



Pr{Ci < y < C 2 




9. Measures of Quality of the Model Fit 



R2 = 1 = 



(17) 
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in the model with no explanatory variables (regressor 1 only). There are two natural 
extensions of the definition (1) for the two-level data (students within departments). 

For the adopted model (e.g., variables 1 , V, Q, A, and U and the covariate) we have 
the within-department variances {ct?} (possibly equal to a common value) and the 
between-department variance matrbc S. The variance of an observation in this model is 
aj + XjjSxjj. The two-level analogue of the "empty" model is 



where - N(0, r^gw ^^d ~ N(0, cr^gw)- variance of an observation in this model is 

■’’raw + Now for the empirical Bayes we can consider two definitions: 



The definition A is based on the unconditional variance of an observation and the 
definition B on the within-department variances. To provide a single figure (percentage) 
for each department, using the definition A, XjjSxlj in (19a) should be replaced by the 
department-mean of these quantities, Ej x^Sxlj/nj. Both definitions provide measures of 
improvement of prediction due to the explanatory variables, and these measures are 
department-specific. In practice the variance matrix X and all the variances are replaced 
by their maximum likelihood estimates. The definition B will always yield in the interval 
(0,1), and it will be constant across the departments if common within-department variances 
are fitted in both the raw and the assumed models; the definition A will yield values of R^ 
outside (0,1) only in the most pathological cases not expected to arise in the GRE VSS data. 

An important advantage of these definitions over those in current use for GRE VSS 
reports is that they involve pooling of information across the departments. The sampling 
properties of the estimators A and B are only moderately affected by the department size 
and the within-department distribution of the GRE and U scores. Thus, the definition of 
predicted R^ in Braun and Jones (1985), based on within-department half samples, would 

‘j ' ' 



yij ®ij> 



(18) 



A. R2 = 1 - (o| + XijXxjj)/(c7j„,^ + T^^J. 

B. R2 = 1 - 



(19a) 

(19b) 
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be suitable only for larger departments; for department sizes 10 and smaller, it has a very 
large resampling variation because the implied prediction is based on too few observations. 

10. Discussion 

Although the main purpose of this report is to design adjustments to the EB 
procedures that would guarantee nonnegative within-department coefficients, it should be 
emphasized that there are no profound reasons why all the coefficients should be 
nonnegative. On the one hand, owing to small department sizes strong evidence of a 
negative coefficient for any particular department is most unlikely. On the other hand, from 
the EB analyses we have evidence that a small (but significant) proportion of the 
departments does have negative coefficients. For example, in several EB analyses of the CE 
dataset both the estimated mean and the estimated standard deviation for the 
within-department coefficients on the quantitative score are about .06. That implies that 
about 1004>(-1) = 16%^ of the departments have negative coefficients on the quantitative 
score. The unpredictability of the composition of backgrounds of the students of a 
department, and the imperfect explanation of the (graduate school) academic performance 
in terms of the predictors scores, provide a purely substantive explanation for negative 
coefficients. The complex processes of selection and self-selection of students may, purely 
by chance, lead to an apparent negative association of a predictor score with the graduate 
school performance in a small proportion of the departments. 

The regression formula is derived from enrolled students, but its application is 
extrapolated to applicants, who may have much more varied background scores. In addition, 
the fact that FYA is not a perfect measure of academic performance in graduate school will 
cause a distortion of the relationship of the academic performance (as a latent variable) on 
the predictor scores, and since there are a large number of departments, evidence about 
negativeness of some of the coefficients may strengthen. 



'**4) is the distribution function of a standard normal variate. 
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We believe that in the current procedures negative coefficients probably arise much 
more frequently than it would be reasonable to expect. The proposed procedures can 
substantially reduce and, if desired, even eliminate occurrences of negative coefficients. The 
motivation for avoiding negative estimated coefficients is based on the entirely 
understandable inability to provide a case-by-case (or comprehensive) explanation of why 
a particular negative coefficient has arisen, and probably on the belief that negative 
coefficients would be seen as evidence that GRE scores are not very useful predictors. 
Certainly, negative coefficients are difficult to interpret without reference to the complex 
and not very well understood processes of selection and self-selection of the student body, 
and as a consequence, such reports might be regarded by an uninformed client as not useful, 
or suspected to be incorrect. 

However, there are realistic configurations of student background in a department 
for which the true coefficients are negative. After all, we should regard these configurations 
as outcomes of a random process, and so among the large number of departments there are 
bound to be a few with extreme or unexpected configurations that are associated with 
negative coefficients. Therefore, by establishing an unresen/ed committment to nonnegative 
estimated coefficients, the GRE VSS is threatened with systematic biases in its reports. 

The reported regression formula cannot be used on its own to justify a substantial 
adjustment of the process of selection of students in the coming academic year. The 
formula reflects a mixture of two causes: how the background scores are "converted" into 
academic performance and how successful the selection and self selection processes are. 
Thus, any substantial change of the selection process will affect the relationship of the 
studied scores in the future. In the extreme case, if selection of students were based solely 
on this formula, the selection procedures might be changed over time so dramatically that 
a substantially different formula for the dependence of FYA on GRE scores would then 
apply. Also, in prediction formulas based on models with a covariate (say, V.j) the "new" 
value of V.J (unknown at the time) should be applied. Reliance on small variation of the 
covariate across the years is not justified, and the current procedures do not have any means 
for adjustment due to uncertainty about the future value of the covariate, which for most 
departments is the average of a very small number of scores. This raises the issue of the 
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covariate as a suitable representation of the context. TTie GRE VSS should search for a 
more valid representation of the context in the employed models. For example, the average 
of GRE score means, (V.j + Q.^ + A.j)/3, could be considered, but as a contextual covariate 
it still suffers from the same ills as V.jt instability, imperfect representation of the context, 
and high correlation with the predictor variables. 

As an iniiial step, information about departments that have provided data for several 
years should be collected. For these departments the stability of the estimated coefficients 
as well as the quality of the prediction could be assessed. 

The procedure of selecting separately for each department a model that has no 
negative coefficients is prone to serious biases. In EB analysis (potential) bias is a property 
of the model as applied to the entire dataset. As we select a subset of the data 
(departments with nonnegative coefficients), the estimators with no bias for the entire 
dataset may be substantially biased for the selected subset, especially if this subset is of 
moderate size and if the selection is based on the results themselves. 

1 1. Recommended Changes in the GRE VSS 

We recommend that the program staff investigate the feasibility and cost of the 
following changes in the operational analysis of the GRE VSS: 

1. Limit the use of covariates to such an extent that acute collinearity would not arise. 
In analyses of large datasets, either only one or no covariate should be used. In 
analyses of smaller datasets, such as students with GRE Subject Test scores, the 
model should be substantially reduced; no interactions of covariates with the special 
subject indicators should be used. The guidelines for minimal data sizes for special 
subjects (at present 100 students from at least 10 departments) should be reviewed 
and increased substantially. 

2. The departments that have provided data over several years should be used for cross- 
validation and to provide empirical evidence of stability of the regression coefficients 
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in consecutive years. Changes in the values of the covariates (such as V.j) should be 
recorded because they are a threat to the usefulness of the GRE VSS reports. The 
current practice of using the last year’s value of V.j in the prediction formula for the 
next year’s should be reviewed. It would be more appropriate to use the latest 
available value for the mean of the verbal scores, or impute its estimate. As an 
alternative, the prediction formulas could be presented in the form requiring the 
department to substitute its current value of V.j. It would not be appropriate to 
substitute the mean verbal score of the applicants because of substantial variation in 
selectivity of the departments. 

3. A common value of the within-department variance cr should be used. See 
Appendix B for technical details. 

4. The extended shrinkage should be implemented and the amount of shrinkage 
recorded. See Appendix A for technical details. 

5. For new datasets, pooled ordinary regression models (ignoring between-department 
variation) with covariate-by-variable interactions should be used to establish the 
extent of the problem with negative coefficients and to assess multicollinearity of the 
regression parameters. 

6. A single model should be used for the report to all the departments in a dataset. 
This would avoid the "report" bias. 

7. The value of the log-likelihood should be used in the choice between candidate 
models. 

8. The procedure for expectancy tables described in Section 8 should be implemented. 
It will produce results very similar to those obtained by the current procedure, except 
that errors due to numerical integration would be largely avoided. 
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9. The minimum numbers of departments and students for a subject test dataset should 
be reviewed and increased substantially. 
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Appendix 4 



Extended Shrinkage Empirical Bayes Estimation 

The empirical Bayes estimates of the within-department coefficients are given by 
the formula 

r^ = (P* + Pj)-\P* G^'Z^ + Pfi) (A.l) 

where 

P’ = is the inverse of the estimated between-department variance matrbc, 

P,= CTjXj’-X, is the estimated within-department information matrix, 

Bj= (Xj^Xj y'Xj^yj is the least-squares estimate of the within-department 
coefficients, 

G is the estimate of the department-level regression coefficients, 

and 

Zj is the vector of covariates for the department j, 

(Braun and Jones, 1985). The estimator rj is a mixture of the within-department estimate 
Bj, which is unbiased but inefficient, and the pooled regression estimate Bj’ = G^Zj, which 
is suitable for the "average" department. The advantage of the empirical Bayes method is 
in optimal "trading" of unbiasedness for efficiency (lowest mean-square error). The 
optimality properties hold under the unrealistic assumption of known variance and 
covariance parameters, {oj} and the elements of £. The optimality of the empirical Bayes 
estimates is under threat when inappropriate models are used and when the estimates of the 
variances and covariances are subject to substantial sampling variation. From the form of 
(A.l) we can deduce that these problems are particularly acute if the estimates of some of 
the within-department variances oj are very small. As observed in Section 7 (see Figure 1) 
most of the negative coefficients occur for such departments. 
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We propose to adapt the empirical Bayes estimator to satisfy the additional constraint 
of nonnegativity of the regression coefficients at minimal loss of efficiency. We consider a 
more general class of estimators; 

r,(Ci) = (P‘ + c,P|)-'(P'b; + CjP|Bj), (A.2) 

where 0 < Cj < 1 is a department-specific constant (to be chosen by the analyst). The 
extreme choices are tj = rj(l), the empirical Bayes solution, and rj(0) = 6j’, the least- 
squares solution for the pooled dataset. 

For the simplest department-level design, Zj = 1, we have 

rj(0) = G^', 



which is expected to be positive for each dataset. More complex choices for Zj, such as 
Zj = (1, V.j, A.j), where V.j and A.j are the department-means for verbal and analytical 
scores, respectively, should be admitted only when G^Zj = (g^ + gi2V.j + gi3A.j, g 2 i + 822^*] 
+ g23A.j, ...) have nonnegative components for all values of V.j and A.j that occur in the data. 

I^t’s assume that rj(0) = G^Zj is positive for all departments and suppose that- the 
k‘^ component of rj(l) is negative. Then there is a constant Cj such that the second 
component of rj(Cj) is equal to zero. This constant can be found by a simple iterative 
procedure; As an initial approximation we set 



S.INI 



(G‘'Zj),/{(G'^'Zj), - [rj(0)]J, 



(A.3a) 



where the subscript k denotes the k‘^ component of the vector. This value of Cj is used to 
evaluate (A.2). If the k"’ component of the new vector rj(cj) is not close enough to zero, we 
essentially iterate (A.3a) by updating 



S.Nl'W 



Ci,OLD(G^Zj)> / {(<3'Zj)k - [■•j(Cj.ou))U- 



(A.3b) 
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The iterative formula (A.3b) would be applied until | [rj(Cj oLD)]k 1 -^02. 

Applying no shrinkage corresponds to Cj = 1. Repeated application of the extended 
shrinkage (more than one negative coefficient for a department) corresponds to the product 
of the shrinkage c<. fficicnts. The amount of shrinkage could be effectively monitored by 
recording all the departments for which it was employed, together with the shrinkage 
coefficients, and a suitable summary would be the total shrinkage Ej Cj. 

As an alternative the linear Taylor expansion for rj(Cj) 

rj(C|) - r,(l) + (1 ■ Cj)(P’ + - rj(l)> (A.4) 

at Cj = 1 could be used. This approximation can be used iteratively until a constant Cj is 
found for which the component of rj(Cj) is close enough to zero (so that after rounding to 
the usual number of decimal places it would be reported as .0). If a different component 
of Tj(Cj) is negative, the procedure will be repeated for that component. 

The procedure based on (A.3b) is much simpler and requires only a moderate 
number of iterations (usually less than 6). 



Appendix B 



Common Within-department Variance 

The maximum likelihood estimator for the common within-department variance is 
given by the formula 

cp- = eV-‘eVN, (A.5) 

where e is the vector of student-level residuals,' e = y - XGZ, V is the (/“-multiple of the 
estimated variance matrix for the observations y (FYA-scores), 

V = 1„ + ftXj}, 

In the NxN identity matrix, (l = %/i^, X, the segment of the design matrix X corresponding 
to the department j, and N the number of students in the dataset. The matrix V depends 
on the estimates of the variances and covariances, and therefore it has to be updated at 
every iteration. Since V is a block-diagonal patterned matrbc, formulas for evaluation of 
(A.5) without inversion of any large matrices can be employed; see LaMotte (1972) or 
Longford (1987). We have 

V-' = - a-2 (//ag.{Xj‘'(t-' + XjXj7^)-'Xj}, (A.6) 

and hence 

P = ee'VN - Ej ejX/(£-' + (A.7) 



ee 



= - 2Y^X/G‘z7 + 



Zj'G'XjXj 



• 1 ', 



GZj) 





where 



and 



X]e;^ = - X^X/G'^Z^. 

Note that + XjXj^/(P in (A.6) is equal to P’ + Pj. 

In an iterative procedure the residual mean-squared error from the pooled ordinary 
regression can be used as the initial estimate for o^. For the CE dataset the estimate of the 
common variance is about .10. 
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Table 1 



Ordinal^ Regression Models for the Departments with Large Numbers of Students 

Department 1 has 102 students, department 229, 106 students. Standard errors for the 
estimated regression parameters are in parentheses. 



Department 1 

pFYA = 1.355 + .532U + .068V + .055Q - .108A 
(.103) (.128) (.115) (.114) 

pFYA = 2.926 - .66211 + .0723V + .064Q + .082A + .192U“ + .04 lA' 
(1.026) (.128) (.116) (.492) (.164) (.091) 

Department 229 

pFYA = 1.944 + .378U + .121V - .OlOQ - .007A 
(.089) (.072) (.085) (.063) 

pFYA - 4.498 - 1.203U + .105V - .026Q + .020A + .248U^ 
(1.499) (.074) (.086) (.064) (.234) 



Table 2 



Regression of the Operational Model and Submodel Coefficients on the Covariate V.j 

The operational model is given by (7), that is using the department mean verbal score as 
a covariate. The submodel is obtained from (7) by deleting the covariate, that is, by setting 
8cv = gw = gqv = 8av = 8uv = 0. The operational software was used to fit these two models 
(500 iterations). The resulting department coefficients were then regressed, using the 
ordinary regression with equal weights, on the department mean score V.j. The standard 
errors corresponding to the ordinary regression are given in parentheses. 



Coefficient 


Submodel 


Operational 

Model 


U 


.127 + .035 V.j 
(.012) 


.280 + .020V.j 
(.013) 


V 


.102 - .008V.J 

(.004) 


.280 - .008V.j 
(.004) 


Q 


.065 - .002V.j 
(.004) 


.193 - .005 V.j 
(.004) 


A 


-.006 + .013V.j 
(.003) 


.039 - .003V.j 
(.003) 
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Figure l.a Negative coefficients on the undergraduate grade-point average and fitted within-department variance. 
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Figure l.b Negative coefTicients on the verbal score and fitted within-department variance. 
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Figure l.c Negative coefficients on the quantitative score and fitted within-department variance. 
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Fif»urc l.d Negative coelllcients on the analytical score and fitted within-departnient variance. 
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